Create your .csv file. You need to do this in excel. You should use the data below.
| id | water | redcow |
|---|---|---|
| 1 | 77 | 68 |
| 2 | 79 | 53 |
| 3 | 85 | 50 |
| 4 | 96 | 63 |
| 5 | 62 | 82 |
Your data should be in long format and have three columns:
id, drink, and sleep.
It should look like the image below:
Make sure you save it as a .csv. file.
Next we need to call any packages!
install.packages("tidyverse") #installs the tidyverse package if you don't have it already
library(tidyverse) #calls the tidyverse package.Import your newly created data file into RStudio using
read_csv().
Make sure your working directory is set-up correctly.
If this works, you should see mydata appear in the
environment (top right panel).
Today we will just quickly check our data.
head(mydata) # shows the first few rows of your data.
names(mydata) # shows the variable names.
summary(mydata) # provides summary statistics of a data set.We need to call any packages!
# we want to produce some basic descriptive statistics for our data set.
# we know that we want to compare redcow with water, so use the same code from last week to produce mean values.
descriptives_bygroup <- mydata %>%
group_by(drink) %>%
summarise(m = mean(sleep), sd = sd(sleep))
# now we will print our descriptive stats in something called a tibble
print(descriptives_bygroup) # print to console
view(descriptives_bygroup) # open in new tab, it is best to use thisRun an independent t-test comparing beats per minute (bpm) for the redcow vs. water groups.
# we want to run a t-test to compare the bpm (dependent variable) across the two drink groups (independent variable: redcow and water)
redcow_t <- t.test(sleep ~ drink, mydata, var.equal = FALSE)
print(redcow_t)
# we also want to find and report the effect size. for a t-test we can use Cohen's d.
mydata %>%
cohens_d(sleep ~ drink, var.equal = FALSE)# we want to check any necessary assumptions that might change how we interpret our model
# we do not need to look at homogeneity of variance when using Welch's t-test.
# we only need to look at normality.
hist(mydata$sleep[mydata$drink == "redcow"]) #creates histogram for redcow
hist(mydata$sleep[mydata$drink == "water"]) #creates histogram for water
shapiro.test(mydata$sleep[mydata$drink == "redcow"]) #runs Shapiro test for redcow
shapiro.test(mydata$sleep[mydata$drink == "water"]) #runs Shapiro test for waterYou will practice graphing your findings from next week. But for those of you who want a head start, feel free to run the code below to visually present your descriptive statistics. Copy and paste the code below.
# use the code to create a box plot of your descriptive statistics.
ggplot(mydata, aes(x = drink, y = sleep)) +
geom_boxplot() +
labs(title = "Sleep Quality Score for RedCow vs Water",
x = "Drink",
y = "Mean Sleep Quality Score") +
theme_classic()We need to call any packages!
install.packages("rstatix")
install.packages("effectsize")
library(tidyverse)
library(rstatix)
library(effectsize)It looks as though the data are not in long format.
We need to create a new data set with the data in long format.
Thankfully we can do this with some simple code:
mydata_long <- mydata %>% # creates a new object called mydata_long
pivot_longer(cols = c(before, after), # use pivot_longer() and select the column names.
names_to = "time", # give a column name for our independent variable
values_to = "bpm") # give a column name for our dependent variableCheck that mydata_long has appeared in the environment
(top right panel)
Adapt the code below to produce descriptive statistics.
You need to use the variable names from your long data file:
time and bpm.
Change NULL to the relevant details.
descriptives <- NULL %>% # Which data set will you use?
group_by(NULL) %>% # what should you split the file by?
summarise(mean = mean(NULL), sd = sd(NULL)) # mean and standard deviation of the dependent variable
view(NULL)We can also look at a box plot too. Use the code below:
Run a repeated t-test comparing BPM before and after drinking RedCow.
# we want to run a t-test to compare the bpm (dependent variable) across time (levels: before vs. after)
before <- mydata_long$bpm[mydata_long$time == "before"] # tell R which data is "before"
after <- mydata_long$bpm[mydata_long$time == "after"] # tell R which data is "after"
redcow_t <- t.test(before, after, paired = TRUE)
print(redcow_t)
# we also want to find and report the effect size. for a t-test we can use Cohen's d.
cohens_d(before, after, paired = TRUE)Use the lines of code below to check the assumption of
normality.
Remember to check back on the lecture slides if you need.
Firstly, we need to calculate a difference score.
# we need to check normality, but for the difference scores (see the lecture if this doesn't make sense)
# first we need to calculate the difference score
# we will go back to the original data file "mydata" for this, not "mydata_long"
diff_score = mydata$before - mydata$after
hist(diff_score) #creates histogram for the diff_score
shapiro.test(diff_score) #runs Shapiro test for diff_scoreRe-run the analysis presented in lecture.
However, have you noticed how chocolate bars seem smaller and smaller
these days?
Well, here at RedCow we also want to introduce smaller bars Run the
analysis using redcowchoc.csv, but this time the reference value is 30
grams.
Remember to amend the code below to replace NULL.
# import data
mydata2 <- read_csv("redcowchoc.csv")
# run one sample t-test against the reference value of 30g
t.test(mydata2$weight, mu = NULL)# 7.1 advancing data visualisation: boxplot
#reusing the code for the boxplot in exercise 3, let's tidy up the boxplot by adding some labels.
ggplot(mydata_long, aes(x = time, y = bpm)) +
geom_boxplot() +
labs(title = "Resting Heart Rate After Consumption of RedCow",
x = "Time",
y = "Median Beats per Minute (BPM)") +
theme_classic()
## 7.2 advancing data visualisation: bar chart
#this is example code for a bar chart to look at for now, you'll learn more about how it works in future classes!
ggplot(mydata_long, aes(x = time, y = bpm)) +
geom_bar(stat = "summary", fun = "mean", fill = "lightgrey", width = 0.7) +
stat_summary(fun.data = mean_sdl, fun.args = list(mult = 1), geom = "errorbar", width = 0.2) +
labs(title = "Resting Heart Rate After Consumption of RedCow",
x = "Time Point",
y = "Mean Beats per Minute (BPM)") +
theme_classic()We need to call any packages!
# only imstall packages if you don't have them installed already
install.packages("emmeans")
insyall.packages("afex")
install.packages("car")
# otherwise just call the packages
library(tidyverse)
library(afex)
library(rstatix)
library(emmeans)
library(broom)
library(car)# have a quick check of your data file to learn about it.
head(mydata)
names(mydata)
summary(mydata)We need to code our variables.
Any categorical variable needs to be set-up as a factor (as.factor) and
any scale/numerical variables need to be set-up as a numeric value
(as.numeric).
This is an important step because it could cause issues later if
missed.
# now you are working with more complex data sets, we need to make sure variables are set-up correctly. This is quite an important step.
# we need to make sure any categorical variables are coded as a "factor" and any scale variables are coded as "numeric".
mydata$group <- factor(mydata$group) #turns group into a factor
mydata$alert <- as.numeric(mydata$alert) #turns this into numericAdapt the code below to produce descriptive statistics (replace
NULL).
descriptives <- NULL %>% # Which data set will you use?
group_by(NULL) %>% # what should you split the file by?
summarise(mean = mean(NULL), sd = sd(NULL)) # mean and standard deviation of the dependent variable
view(NULL)We can also look at a box plot of our data. Use the code below to generate a box plot:
#how about a box plot too
ggplot(mydata, aes(x = group, y = alert, fill = group)) +
geom_boxplot() +
theme_classic()Run a one-way independent ANOVA comparing the three groups (lark,
neither, owl) as the independent variable and the alertness score as
dependent variable.
For this exercise you should run a post-hoc test.
#run the ANOVA model
anova_result <- aov_ez(id = "id", # identifier variable
dv = "alert", # dependent variable variable name
between = "group", # between subjects variable name
type = 3, # leave this as `3`
include_aov = TRUE, # leave this as `TRUE`
data = mydata) # name of your data file
# just some code to help tidy the output
anova_result_output <- (anova_result$anova_table) %>%
tidy()
view(anova_result_output)
# we need eta squared for the effect size
install.packages("effectsize")
library(effectsize)
eta_squared(anova_result, partial = FALSE) # asks for eta squarePost-hoc Options
First we need to run emmeans()
# we need emmeans to run post-hocs
em_means <- emmeans(anova_result, ~ group) # group is the between subjects variable in our dataThen set-up the pairwise comparisons and decide which alpha level
adjustment to use.
Pick one from either:
1. Holm
2. Bonferroni
3. Tukey HSD
pairwise_contrasts <- contrast(em_means, method = "pairwise") # set up pairwise comparisons
print(pairwise_contrasts, adjust = "holm") # applies holm adjustment
print(pairwise_contrasts, adjust = "bonferroni") # applies bonf adjustment
print(pairwise_contrasts, adjust = "tukey") # applies tukey adjustmentFor future reference, here is the Games-Howell Option
# Games-Howell is an option for post-hocs when you have unequal variances
games_howell <- mydata %>%
games_howell_test(alert ~ group) # group is the between subjects variable in our data
print(games_howell)Ordinarily we wouldn’t run multiple analyses on the same data set
(this is considered dodgy and is known as p-hacking).
But, for education purposes I’m going to ask you to break the rules and
re-analyse the same data, this time using a planned contrast.
We will use the same ANOVA model from the previous exercise:
anova_result The model is the same, we are just going to
use a planned contrast instead of post-hoc.
# we need emmeans to run planned contrasts
em_means <- emmeans(anova_result, ~ group) # group is the between subjects variable in our dataLet’s view the levels of your data and the order they’re in This is important as we have to make sure the order matches our contrasts.
levels(mydata$group) # tells us the order to set up any specified contrasts.
# this order matters as it must match your contrast codesSimple Contrast
simple_contrast <- contrast(em_means, method = list(
"lark vs neither" = c(-1, 1, 0), # compares the first level to the second level
"lark vs owl" = c(-1, 0, 1))) # compares the first level to the third level
print(simple_contrast, adjust = "holm")Repeated Contrast
repeated_contrast <- contrast(em_means, method = list(
"lark vs neither" = c(-1, 1, 0), # compares the first level to the second level
"neither vs owl" = c(0, -1, 1))) # compares the second level to the third level
print(repeated_contrast, adjust = "holm")Helmert Contrast
helmert_contrast <- repeated_contrast <- contrast(em_means, method = list(
"lark vs overall effect later levels" = c(-1, 0.5, 0.5), # first vs. overall effect later levels
"neither vs overall effect later levels" = c(0, -1, 1))) # second vs. overall effect later levels
print(helmert_contrast, adjust = "holm")Further explanation for coding contrasts
Polynomial Contrast
poly_contrasts <- contrast(em_means, method = "poly") #sets up an runs the polynomial contrast
print(poly_contrasts)Dunnett Contrast
# Dunnet contrast which will compare a "reference condition" to each of the other conditions.
dunnett_contrasts <- contrast(em_means, method = "trt.vs.ctrl", ref = "lark")
# change "lark" to the name of the reference/control group/condition.
print(dunnett_contrasts)Use the code below to evaluate both normality and homogeneity of variance.
#testing normality, looking at normal distribution of residuals from the model.
residuals <- residuals(anova_result) #pulls residuals from the model
# produce QQ-Plot
qqPlot(residuals) #produces a qq-plot of residuals
# produce histogram of residuals
hist(residuals, breaks = 20, main = "Histogram of Residuals", xlab = "Residuals", col = "lightblue")
# Shapiro test on residuals
shapiro.test(residuals) #Shapiro test for residuals
# testing homogeneity of variance
leveneTest(alert ~ group, mydata)Researchers from another lab decided to try and replicate these
findings.
Load the data file “earlybird_replication.csv”.
Run the analysis using a one-way independent ANOVA. Remember to ask for
a box plot.
You can re-use the code from exercise four.
We need to call any packages!
Import the memory.csv data set.
The data set does not appear to be in long/tidy form.
We need to change that using pivot_longer. We will create a
new version of the data but in long format.
#it looks like the data are not in long format. we need to fix that.
mydata_long <- mydata %>% # creates a new object called mydata_long
pivot_longer(cols = c(year1, year2, year3, year4), # the existing column names for the IV
names_to = "time", # the name of your independent variable
values_to = "score") # the name of you dependent variableWe need to code our variables.
Any categorical variable needs to be set-up as a factor (factor) and any
scale/numerical variables need to be set-up as a numeric value
(as.numeric).
This is an important step because it could cause issues later if
missed.
# we need to make sure any categorical variables are coded as a "factor"
# and any scale variables are coded as "numeric"
mydata_long$time = factor(mydata_long$time) #turns time into a factor
mydata_long$score = as.numeric(mydata_long$score) #turns score into numericAdapt the code to produce descriptive stats in the same way you have
done previously.
*Hint: Use mydata_long as your data set.
descriptives <- NULL %>% # Which data set will you use?
group_by(NULL) %>% # what should you split the file by?
summarise(mean = mean(NULL), sd = sd(NULL)) # mean and standard deviation of the dependent variable
view(NULL)Let’s also produce a violin plot to visually present the data.
ggplot(mydata_long, aes(x = time, y = score, fill = time)) +
geom_violin(trim = FALSE) + # violin plot without trimming tails
geom_boxplot(width = 0.1, color = "black", alpha = 0.5) + # overlay a boxplot for additional summary statistics
theme_classic() + # Classic theme
labs(title = "Violin Plot of Memory Score by Group", # add a title
x = "Time Point", # x-axis label
y = "Memory Score") # y-axis labelRun a one-way repeated ANOVA comparing memory score across the four
time points (year1, year2, year3, year4).
Adapt the code below. Change NULL to the relevant
variables/information
anova_result <- aov_ez(id = "NULL",
dv = "NULL",
within = "NULL",
type = 3,
include_aov = TRUE,
data = NULL)
# just some code to help tidy the output
anova_result_output <- (anova_result$anova_table) %>%
tidy()
view(anova_result_output)
# we need eta squared for the effect size
library(effectsize)
eta_squared(anova_result, partial = FALSE) # asks for eta squareWe might also want to print() the ANOVA results. This is
because we can see if a Greenhouse-Geisser correction has been
applied.
You can check this using two methods:
1. If the degrees of freedom in the ANOVA output are not whole numbers,
and have two decimal places then the assumption of Sphericity was not
met and the model has automatically adjusted to account for this.
2. You can use the code below to print() the model and it
will tell you if any correction was applied.
Once you have looked at the ANOVA output and interpreted, if there is
a significant difference you should run planned or post-hoc
comparisons.
For today’s example, we want to use a planned contrast.
First we need to pull out the emmeans().
Change NULL to the independent variable in the model:
time.
# for any contrasts or post-hocs we need emmeans
em_means <- emmeans(anova_result, ~ NULL) # NULL should be the independent variable.Now let’s run the code for a polynomial contrast.
#let's run a polynomial contrast
poly_contrasts <- contrast(em_means, method = "poly") #sets up an runs the polynomial contrast
print(poly_contrasts)It might help to visualise this effect.
Maybe a line chart could help…
#let's visualise the data to help our interpretation
ggplot(mydata_long, aes(x = time, y = score, fill = time)) +
stat_summary(fun = mean, geom = "line", color = "black", size = 1, aes(group = 1)) + # line for mean
theme_classic() +
labs(title = "Plot of Memory Score Across Time", # add a title
x = "Time Point", # X-axis label
y = "Memory Score") # Y-axis labelOrdinarily we wouldn’t run multiple analyses on the same data set
(this is considered dodgy and is known as p-hacking).
But, for education purposes I’m going to ask you to break the rules and
re-analyse the same data, this time adding a covariate.
You have five NULL values to change.
Set the covariate to education which is the number of years
of education.
As this is a scale variable (not categorical), set
factorize = FALSE.
# run the ANOVA model but this time with a covariate
# it is now an ANCOVA
# there is a `c` in the object name, it says ancova not anova
ancova_result <- aov_ez(id = "NULL",
dv = "NULL",
within = "NULL",
covariate = "NULL",
factorize = FALSE,
type = 3,
include_aov = TRUE,
data = NULL)
print(ancova_result) # with a covariate in the model it is easier to just print
# we need eta squared for the effect size
library(effectsize)
eta_squared(ancova_result, partial = FALSE) #ask for eta squareWe can also run polynomial contrasts on this model too.
Remember it is now called ancova_result and not
anova_result.
That sneaky little c is in the ANCOVA model.
# for any contrasts or post-hocs we need emmeans
em_means <- emmeans(ancova_result, ~ time) # note ancova_result not anova_result
# let's run a polynomial contrast
poly_contrasts <- contrast(em_means, method = "poly") #sets up an runs the polynomial contrast
print(poly_contrasts)Use the code below to evaluate both normality and homogeneity of
variance.
This will run it for the anova_result model from exercise
four.
You can just add a c into the code to also check for the
ANCOVA model.
#testing normality, looking at normal distribution of residuals from the model.
residuals <- residuals(anova_result) #pulls residuals from the model
# produce QQ-Plot
qqPlot(residuals) #produces a qq-plot of residuals
# produce histogram of residuals
hist(residuals, breaks = 20, main = "Histogram of Residuals", xlab = "Residuals", col = "lightblue")
# Shapiro test on residuals
shapiro.test(residuals) #Shapiro test for residualsWe need to call any packages!
Import the memory.csv data set.
Check the file.
# have a quick check of your data file to learn about it.
head(mydata)
names(mydata)
summary(mydata)
view(mydata)We need to code our variables.
# we need to make sure any categorical variables are coded as a "factor" and any scale variables are coded as "numeric".
mydata$notification = factor(mydata$notification) #turns variable into a factor
mydata$usage = factor(mydata$usage, levels = c("rare", "regular", "problematic")) # turns into a factor and orders levels
mydata$anxiety = as.numeric(mydata$anxiety) # turns variable into numericWhy did we order the usage factor.
R will automatically put variables in alphabetical order and so I wanted
to change these so they’re in the order of lowest to highest usage
(i.e. rare, regular, then problematic as third level).
Generate descriptives.
As we have a factorial design, we have two categorical variables to
include in the group_by() function. We can also ask for
n() so we know how many participants are in each group.
# we have a factorial design so we need to group by BOTH independent variables
# we will also ask for n() to tell us about the group sizes
# we will also start using standard error (se)
descriptives <- mydata %>%
group_by(notification, usage) %>%
summarise(m = mean(anxiety),
sd = sd(anxiety),
n = n(),
se = sd/sqrt(n))
view(descriptives)A box plot might also be useful.
Also try out ggsave().
You should find the file in your working directory.
#let's also visualise our data with a box plot to see what is going on.
ggplot(mydata, aes(x = usage, y = anxiety, fill = notification)) +
geom_boxplot() +
theme_classic()
ggsave("box_plot.jpg") # this will save your boxplot into your working directory, as a .jpg file.You might prefer a violin plot.
# let's also visualise our data with a violin plot to see what is going on.
ggplot(mydata, aes(x = usage, y = anxiety, fill = notification)) +
geom_violin(trim = FALSE, alpha = 0.1) + # Violin plot without trimming tails
geom_boxplot(width = 0.1, color = "black", alpha = 0.5, position = position_dodge(.9)) + #Overlay a boxplot
theme_minimal() + # Classic theme
labs(title = "Violin Plot of Anxiety Scores by Notification and Use", # Add a title
x = "Usage", # X-axis label
y = "Anxiety Score") # Y-axis label
ggsave("violin_plot.jpg") # this will save your violin plot into your working directory, as a .jpg file.Run a 3x2 factorial ANOVA.
#we can run the ANOVA using aov_ez()
anova_result <- aov_ez(id = "id",
between = c("notification", "usage"),
dv = "anxiety",
type = 3,
indlude_aov = TRUE,
data = mydata)
factorial_output <- summary(anova_result) # this will organise the output ready for viewing
print(factorial_output) # print it to the console
library(effectsize)
eta_squared(anova_result, partial = TRUE) #ask for partial eta squareIf the is a significant interaction, we need to
break it down. A great start point is to plot it.
Either use your box plot/violin plot.
Or using a line graph is a helpful way to interpret an interaction.
#let's plot it first
interaction.plot(mydata$usage, # plot the x-axis
mydata$notification, # plot the different lines
mydata$anxiety, # state the dependent variable
fun = mean) # ask for the mean valuesNow let’s check out what is driving this interaction.
# first we need to get the emmeans
# then we need to ask for cohen's d too
posthoc_factorial <- emmeans(anova_result,
pairwise ~ notification| usage,
adjust = "holm")
output <- posthoc_factorial$contrasts %>%
summary() %>%
mutate(cohen_d = effectsize::t_to_d(t.ratio, df))
view(output)There was a significant main effect of usage.
We could also interpret this, however, the interaction is the most
important thing.
If you were to interpret the main effects (where there was no
interaction), you can use the exact same code and steps as you used for
a one-way ANOVA.
# we need emmeans to run post-hocs
em_means <- emmeans(anova_result, ~ NULLL) # change NULL to the significant main effect variable...(usage)Then set-up the pairwise comparisons and decide which alpha level
adjustment to use.
Pick one from either:
1. Holm
2. Bonferroni
3. Tukey HSD
pairwise_contrasts <- contrast(em_means, method = "pairwise") # set up pairwise comparisons
print(pairwise_contrasts, adjust = "holm") # applies holm adjustment
print(pairwise_contrasts, adjust = "bonferroni") # applies bonf adjustment
print(pairwise_contrasts, adjust = "tukey") # applies tukey adjustmentFor future reference, here is the Games-Howell Option.
# Games-Howell is an option for post-hocs when you have unequal variances
games_howell <- mydata %>%
games_howell_test(alert ~ group) # group is the between subjects variable in our data
print(games_howell)# testing normality, looking at normal distribution of residuals from the model.
residuals <- residuals(anova_result) #pulls residuals from the model
qqPlot(residuals) #produces a qq-plot of residuals
hist(residuals, breaks = 20, main = "Histogram of Residuals", xlab = "Residuals", col = "lightblue") #histogram of residuals
shapiro.test(residuals) #Shapiro test for residualsArghhhhh A VIOLATION. is this a problem?
Well, Field et al. (2009) notes that an ANOVA is robust to violations of
both normality and homogeneity IF…
…sample sizes are equal. We can check this either by looking back at
n() with your desscriptives or use the code below to
count:
Let’s not forget homogeneity of variance.
# Optional code: try out the code below to produce different types of plots.
# plot 1
ggplot(mydata, aes(x = usage, y = anxiety, fill = notification)) +
geom_boxplot(outlier.shape = NA, color = "black", width = 0.6,
alpha = 0.8, notch = FALSE) + # Boxplot with black outline and transparency
theme_classic(base_size = 16) + # Classic theme with increased base font size
theme(
plot.title = element_text(hjust = 0.5, size = 20, face = "bold", color = "black"), # Centered, bold title
axis.title.x = element_text(size = 16, face = "bold", color = "black"), # Bold and larger x-axis title
axis.title.y = element_text(size = 16, face = "bold", color = "black"), # Bold and larger y-axis title
axis.text.x = element_text(size = 14, color = "black"), # Larger x-axis text
axis.text.y = element_text(size = 14, color = "black"), # Larger y-axis text
legend.title = element_text(size = 16, face = "bold", color = "black"), # Bold legend title
legend.text = element_text(size = 14, color = "black"), # Larger legend text
panel.grid.major = element_line(color = "grey85", linetype = "dotted"), # Light grey dotted major grid lines
panel.grid.minor = element_blank(), # Remove minor grid lines
panel.background = element_rect(fill = "white", color = NA), # White background for the panel
plot.background = element_rect(fill = "white", color = NA), # White background for the entire plot
legend.position = "right", # Position the legend to the right
legend.key = element_rect(fill = "white", color = NA) # White background for legend keys
) +
scale_fill_brewer(palette = "Set3") + # Use a colorblind-friendly palette
labs(
title = "Box Plot of Anxiety Scores by Usage and Notification Type", # Clear title
x = "Usage Type", # X-axis label
y = "Anxiety Score", # Y-axis label
fill = "Notification Type" # Legend title
) +
coord_flip() # Optional: Flip coordinates for a horizontal box plot
#plot 2
ggplot(mydata, aes(x = usage, y = anxiety, fill = notification)) +
geom_boxplot(outlier.shape = NA, color = "black", width = 0.6, alpha = 0.7) + # Boxplot with no outliers and transparency
theme_minimal(base_size = 16) + # Minimal theme with increased base font size
theme(
plot.title = element_text(hjust = 0.5, size = 24, face = "bold", color = "black"), # Centered, bold title
axis.title.x = element_text(size = 18, face = "bold", color = "black"), # Bold and larger x-axis title
axis.title.y = element_text(size = 18, face = "bold", color = "black"), # Bold and larger y-axis title
axis.text.x = element_text(size = 14, color = "black"), # Larger x-axis text
axis.text.y = element_text(size = 14, color = "black"), # Larger y-axis text
legend.title = element_text(size = 16, face = "bold", color = "black"), # Bold legend title
legend.text = element_text(size = 14, color = "black"), # Larger legend text
panel.grid.major = element_line(color = "grey85", linetype = "dotted"), # Light grey dotted major grid lines
panel.grid.minor = element_blank(), # Remove minor grid lines
panel.background = element_rect(fill = "white", color = NA), # White background for the panel
plot.background = element_rect(fill = "white", color = NA), # White background for the entire plot
legend.position = "right", # Position the legend to the right
legend.key = element_rect(fill = "white", color = NA) # White background for legend keys
) +
scale_fill_manual(values = c("#0072B2", "#D55E00", "#009E73")) + # Custom color palette for distinction and colorblind friendliness
labs(
title = "Anxiety Scores by Usage Type and Notification", # Clear and concise title
x = "Usage Type", # X-axis label
y = "Anxiety Score", # Y-axis label
fill = "Notification Type" # Legend title
) +
coord_flip() # Optional: Flip coordinates for a horizontal box plot
# plot 3
ggplot(mydata, aes(x = usage, y = anxiety, fill = notification)) +
geom_violin(trim = FALSE, alpha = 0.2, color = "black", lwd = 0.1) + # Smooth violin plot with transparency and black outline
geom_boxplot(width = 0.15, color = "black", alpha = 0.5, position = position_dodge(0.9)) + # Overlay a slim boxplot for additional statistics
stat_summary(fun = mean, geom = "point", shape = 20, size = 4, color = "black", position = position_dodge(0.9)) + # Overlay mean points
theme_minimal(base_size = 16) + # Minimal theme with increased base font size
theme(
plot.title = element_text(hjust = 0.5, size = 24, face = "bold", color = "black"), # Centered, bold title
axis.title.x = element_text(size = 18, face = "bold", color = "black"), # Bold and larger x-axis title
axis.title.y = element_text(size = 18, face = "bold", color = "black"), # Bold and larger y-axis title
axis.text.x = element_text(size = 14, color = "black"), # Larger x-axis text
axis.text.y = element_text(size = 14, color = "black"), # Larger y-axis text
legend.title = element_text(size = 16, face = "bold", color = "black"), # Bold legend title
legend.text = element_text(size = 14, color = "black"), # Larger legend text
panel.grid.major = element_line(color = "grey85", linetype = "dotted"), # Light grey dotted major grid lines
panel.grid.minor = element_blank(), # Remove minor grid lines
panel.background = element_rect(fill = "white", color = NA), # White background for the panel
plot.background = element_rect(fill = "white", color = NA), # White background for the entire plot
legend.position = "right", # Position the legend to the right
legend.key = element_rect(fill = "white", color = NA) # White background for legend keys
) +
scale_fill_manual(values = c("#0072B2", "#D55E00", "#009E73")) + # Custom color palette for distinction and colorblind friendliness
labs(
title = "Anxiety Scores by Usage Type and Notification", # Clear and concise title
x = "Usage Type", # X-axis label
y = "Anxiety Score", # Y-axis label
fill = "Notification Type" # Legend title
)Load relevant packages and import this week’s data file.
Check the file.
# have a quick check of your data file to learn about it.
head(mydata)
names(mydata)
summary(mydata)
view(mydata)That file does not look very tidy…
#it looks like the data are not in long tidy format. we need to fix that.
mydata_long <- mydata %>%
pivot_longer(
cols = starts_with("before") | starts_with("after"), # columns to pivot, anything that starts with "before" or "after"
names_to = c("time", "exercise"), # names of new columns
names_pattern = "(before|after)_(run|swim)", # pattern to separate columns
values_to = "wellbeing" # Name of the value column
)
view(mydata_long) # take a look at what the above code has done. nice and tidy!Recode any variables as necessary.
#we need to make sure any categorical variables are coded as a "factor" and any scale variables are coded as "numeric".
mydata_long$time = factor(mydata_long$time, levels = c("before", "after")) #turns time into a factor, changes level.
mydata_long$exercise = factor(mydata_long$exercise) #turns exercise into a factor
mydata_long$wellbeing = as.numeric(mydata_long$wellbeing) #turns wellbeing score into numericGenerate descriptives.
As we have a factorial design, we have two categorical variables to
include in the group_by() function. We can also ask for
n() so we know how many participants are in each group.
# we have a factorial design so we need to group by BOTH independent variables
# we will also ask for n() to tell us about the group sizes
# we will also start using standard error (se)
descriptives <- mydata_long %>%
group_by(time, exercise) %>%
summarise(m = mean(wellbeing),
sd = sd(wellbeing),
n = n(),
se = sd/sqrt(n))
view(descriptives)How would you amend the code above to look at the descriptives comparing before vs. after, ignoring exercise type?
A box plot might also be useful.
Also try out ggsave().
You should find the file in your working directory.
#let's also visualise our data with a box plot to see what is going on.
ggplot(mydata_long, aes(x = time, y = wellbeing, fill = exercise)) +
geom_boxplot() +
theme_classic()
ggsave("box_plot.jpg")Let’s try a bar chart…
Don’t worry about understanding all of this code.
You’ll have a bit of practice with ggplot in a future workshop.
#don't let this code scare you. But run it to produce a bar chart!
# you'll learn more about ggplot soon.
bar_chart <- ggplot(mydata_long, aes(x = time, y = wellbeing, fill = exercise, group = interaction(time, exercise))) +
stat_summary(
fun = mean,
geom = "bar",
position = position_dodge(width = 0.9)
) +
stat_summary(
fun.data = mean_se, # Calculate mean and standard error for error bars
geom = "errorbar", # Use error bars
width = 0.2,
position = position_dodge(width = 0.9)
) +
scale_y_continuous(breaks = seq(0, 30, 2)) + # Set breaks without limiting the scale
coord_cartesian(ylim = c(0, 22)) + # Zoom into the y-axis range from 0 to 20
xlab("Time") +
ylab("Mean Wellbeing Score") +
scale_fill_brewer(palette = "Dark2") +
theme_classic(base_size = 12, base_family = "Arial") +
theme(
legend.position = "bottom",
legend.title = element_blank(),
legend.key = element_rect(fill = "white", colour = "white")
) +
labs(
caption = "Figure 1. Mean wellbeing scores before and after different exercises. Error bars represent standard errors."
)
print(bar_chart)
ggsave("vbar_chart.jpg") # this will save your bar chart plot into your working directory, as a .jpg file.Run a 2x2 factorial ANOVA.
#we can run the ANOVA using aov_ez()
anova_result <- aov_ez(id = "id",
within = c("time", "exercise"),
dv = "wellbeing",
type = 3,
data = mydata_long)
factorial_output <- summary(anova_result)
print(factorial_output)
library(effectsize)
eta_squared(anova_result, partial = TRUE) #ask for partial eta squareIf the is a significant interaction, we need to
break it down. A great start point is to plot it.
Either use your box plot/violin plot.
Or using a line graph is a helpful way to interpret an interaction.
#let's plot it first
interaction.plot(mydata_long$exercise, # plot the x-axis
mydata_long$time, # plot the different lines
mydata_long$wellbeing, # state the dependent variable
fun = mean) # ask for the mean valuesNow let’s check out what is driving this interaction.
# it looks pretty clear that wellbeing score is higher after compared to before.
# but what sees the biggest increase, running or swimming?
# to do this we need to compare running vs. swimming, at the different time levels
# this is the simple effects analysis.
# we need `pairwise ~ WHAT WE WANT TO COMPARE | WHAT VARIABLE WE WANT TO LOOK ACROSS`...
# or `pairwise ~ exercise| time`, see below:
#first we need to get the emmeans
#then we need to ask for cohen's d too
posthoc_factorial <- emmeans(anova_result,
pairwise ~ exercise| time,
adjust = "holm")
output <- posthoc_factorial$contrasts %>%
summary() %>%
mutate(cohen_d = effectsize::t_to_d(t.ratio, df))
view(output)What about sphericity? Well, an important announcement:
SPHERICITY DOES NOT APPLY WHEN THERE ARE ONLY TWO LEVELS
This was a 2x2 design. had it been 3x2 or 3x3 then we’d need to consider
Sphericity… …which you will do next week.
Evaluate the model…
# testing normality, looking at normal distribution of residuals from the model.
residuals <- residuals(anova_result) #pulls residuals from the model
qqPlot(residuals) #produces a qq-plot of residuals
hist(residuals, breaks = 20, main = "Histogram of Residuals", xlab = "Residuals", col = "lightblue") #histogram of residuals
shapiro.test(residuals) #Shapiro test for residualsTake the time to write up your results.
Use APA format and remember to present descriptive statistics.
It is important to practice this because you’ll soon be doing this for
your lab report!
Download the .csv file from Moodle and conduct the appropriate analysis, following the process below:
library(tidyverse)
library(afex)
library(emmeans)
#it looks like the data are not in long or tidy format. we need to fix that.
mydata_long <- mydata %>%
pivot_longer(
cols = word_list:problem_solving,
names_to = "task_type",
values_to = "score")
#we need to make sure any categorical variables are coded as a "factor" and any scale variables are coded as "numeric".
mydata_long$music = factor(mydata_long$music, levels = c("silence", "classical", "pop")) #turns music into a factor
mydata_long$task_type = factor(mydata_long$task_type, levels = c("word_list", "comprehension", "problem_solving")) #turns tasktype into a factor
mydata_long$score = as.numeric(mydata_long$score) #turns score into numericReplace NULL where necessary.
You have used this code previously.
descriptives <- NULL %>%
group_by(NULL, NULL) %>%
summarise(m = mean(NULL),
sd = sd(NULL),
n = n(),
se = sd/sqrt(n))anova_result <- aov_ez(id = "NULL",
within = "NULL",
between = "NULL",
dv = "NULL",
type = 3,
data = NULL)
#tidy the output
anova_result_output <- (anova_result$anova_table) %>%
tidy()
view(anova_result_output)
#what if you want the sphericity information
factorial_output <- summary(anova_result)
print(factorial_output)interaction.plot(mydata_long$NULL, # plot the x-axis
mydata_long$NULL, # plot the different lines
mydata_long$NULL, # state the dependent variable
fun = mean) # ask for the mean values
emmeans_task <- emmeans(anova_result, ~ NULL | NULL)
contrast_task <- contrast(emmeans_task, method = "pairwise", adjust = "holm")
summary(contrast_task)#testing normality, looking at normal distribution of residuals from the model.
residuals <- residuals(anova_result) #pulls residuals from the model
qqPlot(residuals) #produces a qq-plot of residuals
hist(residuals, breaks = 20, main = "Histogram of Residuals", xlab = "Residuals", col = "lightblue") #histogram of residuals
shapiro.test(residuals) #Shapiro test for residuals
library(car)
leveneTest(score ~ music, mydata_long)Check the file and code any variables.
# have a quick check of your data file to learn about it.
#have a quick check of your data file to learn about it.
head(mydata)
names(mydata)
summary(mydata)
#we need to code variables and re-order them to a meaningful order.
mydata$time = factor(mydata$time, levels = c("morning", "afternoon")) #turns into a factor
mydata$drink = factor(mydata$drink, levels = c("coffee", "tea")) #turns into a factor
mydata$score = as.numeric(mydata$score) #turns into numericA note on ggplot2:
ggplot2 is a package but comes withing the tidyverse package
bundle.
ggplot2 is a very powerful data visualisation package.
This workshop will only give you an introduction to the basics…
…particularly the style of figures you might use for lab report
one.
However, there is so much more you can do with ggplot2 and there are
lots of resources online.
# step by step
# layer 1: specify data and aesthetics using aes()
ggplot(mydata, aes(x = NULL, y = NULL, fill = NULL))+
# layer 2: present the values using stat_summary()
stat_summary(fun = mean, geom = "bar", position = position_dodge(), color = "black")+
# layer 3: add error bars
stat_summary(fun.data = mean_se, geom = "errorbar",
position = position_dodge(0.9), width = 0.25) +
# layer 4: labels with labs()
labs(
title = NULL, # leave this as NULL because APA figures do not use titles.
x = "NULL",
y = "NULL")+
# layer 5: group/condition names and colours
scale_fill_manual(values = c("morning" = "slategray4", "afternoon" = "slategray2"),
labels = c("Morning", "Afternoon")) +
scale_x_discrete(labels = c("coffee" = "Coffee", "tea" = "Tea"))+
# layer 6: add the theme to tidy up
theme_minimal(base_size = 14) +
theme(
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.border = element_rect(color = "black", fill = NA, size = 1),
plot.title = element_text(hjust = 0.5, face = "bold"),
legend.title = element_text(face = "bold"),
axis.title.x = element_text(face = "bold"),
axis.title.y = element_text(face = "bold"))You can also run the code below to produce different types of
figures.
You might choose to use this code when you come to write your lab
report.
# violin plot (with box plot) complete code example
# you might use and tweak this code for your lab report.
ggplot(mydata, aes(x = drink, y = score, fill = time)) +
geom_violin(trim = TRUE, scale = "area", alpha = 0.2, color = "white", size = 0.1) +
geom_boxplot(width = 0.2, position = position_dodge(0.9), color = "black", alpha = 0.8, outlier.shape = 1) +
labs(
title = NULL,
x = "Drink",
y = "Productivity Score",
fill = "Time of Day") + # legend title
scale_fill_manual(
values = c("morning" = "#1f77b4", "afternoon" = "#ff7f0e"),
labels = c("Morning", "Afternoon")) + # legend labels
scale_x_discrete(
labels = c("coffee" = "Coffee", "tea" = "Tea")) + # x-axis labels
theme_classic(base_size = 14) +
theme(
panel.grid.major = element_blank(), # Remove major gridlines
panel.grid.minor = element_blank(), # Remove minor gridlines
panel.border = element_rect(color = "black", fill = NA, size = 1),
plot.title = element_text(hjust = 0.5, face = "bold", size = 16),
legend.title = element_text(face = "bold"),
axis.title.x = element_text(face = "bold"),
axis.title.y = element_text(face = "bold"),
axis.text.y = element_text(size = 12, face = "bold")) # Added bold to y-axis text# box plot code example
# you might use and tweak this code for your lab report.
ggplot(mydata, aes(x = drink, y = score, fill = time)) +
geom_boxplot(
width = 0.5,
alpha = 0.7,
color = "black",
size = 0.2) +
# Customizing labels and themes
labs(
title = NULL,
x = "Drink Type",
y = "Productivity Score",
fill = "Time") + # legend title)
scale_fill_manual(
values = c("morning" = "#1f77b4", "afternoon" = "#ff7f0e"),
labels = c("Morning", "Afternoon")) +
scale_x_discrete(labels = c("coffee" = "Coffee", "tea" = "Tea")) + # x-axis labels
# Ensure the y-axis starts at zero
coord_cartesian(ylim = c(0, NA)) +
theme_minimal(base_size = 16) +
theme(
panel.grid.major = element_blank(), # Remove major gridlines
panel.grid.minor = element_blank(), # Remove minor gridlines
panel.border = element_rect(color = "gray80", fill = NA, size = 0.5),
plot.title = element_text(hjust = 0.5, face = "bold", size = 18),
legend.title = element_text(face = "bold", size = 14),
legend.position = "top",
axis.title.x = element_text(face = "bold", size = 14),
axis.title.y = element_text(face = "bold", size = 14),
axis.text.x = element_text(size = 12, face = "bold"),
axis.text.y = element_text(size = 12))Below are the figures you should have been able to produce: